智能论文笔记

What's a good imputation to predict with missing values?

Marine Le Morvan , Julie Josse , Erwan Scornet , Gaël Varoquaux

分类： (统计)机器学习 | 人工智能 | 机器学习

2021-06-01

如何在缺少值的数据上学习一个很好的预测仪？大多数努力都专注于首先抵御耐受和第二学习完成数据以预测结果。然而，这种普遍的实践没有理论基础。在这里，我们显示，对于几乎所有估算的功能，具有强大的学习者的赋予归零过程是贝叶斯最佳。此结果适用于所有缺失值机制，与需要缺失随机设置的经典统计结果相比，以在概率模型中使用归属。此外，它意味着良好的预测不需要完美的条件估算。事实上，我们表明，在完美避阻的数据上，最好的回归函数通常是不连续的，这使得很难学习。制作代替估算以便离开回归功能不变只是将问题转移到学习不连续的避难所。相反，我们建议联合学会归纳和回归更容易。我们提出了这种过程，适应Neumiss，一种神经网络，无论缺失值模式如何，捕获观察到的和未观察的变量的条件链接。实验证实，通过Neumiss的联合归因和回归优于我们的实验中的各个步骤程序，其中有限数量的样品。

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

GAMMA: Generative Augmentation for Attentive Marine Debris Detection

Vaishnavi Khindkar , Janhavi Khindkar

分类：计算机视觉

2022-12-07

We propose an efficient and generative augmentation approach to solve the inadequacy concern of underwater debris data for visual detection. We use cycleGAN as a data augmentation technique to convert openly available, abundant data of terrestrial plastic to underwater-style images. Prior works just focus on augmenting or enhancing existing data, which moreover adds bias to the dataset. Compared to our technique, which devises variation, transforming additional in-air plastic data to the marine background. We also propose a novel architecture for underwater debris detection using an attention mechanism. Our method helps to focus only on relevant instances of the image, thereby enhancing the detector performance, which is highly obliged while detecting the marine debris using Autonomous Underwater Vehicle (AUV). We perform extensive experiments for marine debris detection using our approach. Quantitative and qualitative results demonstrate the potential of our framework that significantly outperforms the state-of-the-art methods.

translated by 谷歌翻译

Towards Automatic Cetacean Photo-Identification: A Framework for Fine-Grain, Few-Shot Learning in Marine Ecology

Cameron Trotter , Nick Wright , A. Stephen McGough , Matt Sharpe , Barbara Cheney , Mònica Arso Civil , Reny Tyson Moore , Jason Allen , Per Berggren

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-07

Photo-identification (photo-id) is one of the main non-invasive capture-recapture methods utilised by marine researchers for monitoring cetacean (dolphin, whale, and porpoise) populations. This method has historically been performed manually resulting in high workload and cost due to the vast number of images collected. Recently automated aids have been developed to help speed-up photo-id, although they are often disjoint in their processing and do not utilise all available identifying information. Work presented in this paper aims to create a fully automatic photo-id aid capable of providing most likely matches based on all available information without the need for data pre-processing such as cropping. This is achieved through a pipeline of computer vision models and post-processing techniques aimed at detecting cetaceans in unedited field imagery before passing them downstream for individual level catalogue matching. The system is capable of handling previously uncatalogued individuals and flagging these for investigation thanks to catalogue similarity comparison. We evaluate the system against multiple real-life photo-id catalogues, achieving mAP@IOU[0.5] = 0.91, 0.96 for the task of dorsal fin detection on catalogues from Tanzania and the UK respectively and 83.1, 97.5% top-10 accuracy for the task of individual classification on catalogues from the UK and USA.

translated by 谷歌翻译

Application of the YOLOv5 Model for the Detection of Microobjects in the Marine Environment

Aleksandr N. Grekov , Yurii E. Shishkin , Sergei S. Peliushenko , Aleksandr S. Mavrin

分类：计算机视觉 | 机器学习 | 神经与进化计算

2022-11-28

The efficiency of using the YOLOV5 machine learning model for solving the problem of automatic de-tection and recognition of micro-objects in the marine environment is studied. Samples of microplankton and microplastics were prepared, according to which a database of classified images was collected for training an image recognition neural network. The results of experiments using a trained network to find micro-objects in photo and video images in real time are presented. Experimental studies have shown high efficiency, comparable to manual recognition, of the proposed model in solving problems of detect-ing micro-objects in the marine environment.

translated by 谷歌翻译

Marine Video Kit: A New Marine Video Dataset for Content-based Analysis and Retrieval

Quang-Trung Truong , Tuan-Anh Vu , Tan-Sang Ha , Lokoc Jakub , Yue Him Wong Tim , Ajay Joneja , Sai-Kit Yeung

分类：计算机视觉

2022-09-23

对异常域特定视频集的有效分析是一个重要的实践问题，在该问题中，最新的通用模型仍面临局限性。因此，希望设计基准数据集，以挑战具有其他约束的特定领域的新型强大模型。重要的是要记住，特定域的数据可能更嘈杂（例如，内窥镜或水下视频），并且通常需要更多经验丰富的用户才能有效搜索。在本文中，我们专注于从水下环境中移动相机拍摄的单次视频，这构成了研究目的的非平凡挑战。提出了新的海洋视频套件数据集的第一个碎片，用于用于视频检索和其他计算机视觉挑战。除了基本的元数据统计数据外，我们还基于低级特征以及所选密钥帧的语义注释提供了几个见解和参考图。该分析还包含实验，显示了检索受人尊敬的通用模型的局限性。

translated by 谷歌翻译

Adaptive and Collaborative Bathymetric Channel-Finding Approach for Multiple Autonomous Marine Vehicle

Nikolai Gershfeld , Tyler M Paine , Michael R. Benjamin

分类：机器人

2022-09-20

本文报告了对使用一辆或多种无人地面车辆（USV）快速识别通道的快速识别通道问题的研究。一种称为基于建议的自适应通道搜索（PBAC）的新算法作为一种潜在的解决方案，可改善当前方法。将PBAC的经验性能与割草机测量和马尔可夫决策过程（MDP）计划进行了比较，该计划具有两个最先进的奖励功能：上限置信度（UCB）和最大价值信息（MVI）。通过比较使用一个，两个，三个或四个USV识别连续通道的时间来评估每种方法的性能。在十个模拟的测深场景和一个野外区域中比较每种方法的性能，每种方法都有不同的频道布局。模拟和现场试验的结果表明，平均多车辆PBAC优于基于割草机，UCB和基于MVI的方法，尤其是在使用至少三辆车辆时。

translated by 谷歌翻译

Facilitating Global Team Meetings Between Language-Based Subgroups: When and How Can Machine Translation Help?

Yongle Zhang , Dennis Asamoah Owusu , Marine Carpuat , Ge Gao

分类：自然语言处理

2022-09-07

全球团队通常由基于语言的亚组组成，这些子组将互补信息汇总在一起以实现共同的目标。先前的研究概述了这些团队的两步工作沟通流。有团队会议使用所需的通用语言（即英语）；为了准备这些会议，人们以母语为母语的对话。在团队会议上的工作沟通通常不如亚组对话效率。在当前的研究中，我们研究了利用机器翻译（MT）的想法，以促进全球团队会议。我们假设在团队会议之前交换子组对话日志会提供上下文信息，从而受益于团队合作。 MT可以翻译这些日志，这可以以低成本的方式理解。为了检验我们的假设，我们进行了一个受试者间实验，其中有20名参与者执行了人事选择任务。每个四重奏包括两名英语母语者（NS）和两个母语是普通话的非母语说话者（NNS）。所有参与者都以其母语的亚组对话开始了这项任务，然后以英语开始了团队会议。我们在团队会议之前操纵了子组对话日志的交换：MT介导的交流与没有。分析参与者的主观经验，任务绩效和讨论深度通过他们的对话举动所反映的，这表明当MT介导的亚组对话日志交流而不是没有交流时，团队会议质量会提高。最后，我们对何时以及如何应用MT进行了思考，以增强语言障碍的全球团队合作。

translated by 谷歌翻译

Underwater autonomous mapping and characterization of marine debris in urban water bodies

Trygve Olav Fossum , Øystein Sture , Petter Norgren-Aamot , Ingrid Myrnes Hansen , Bjørn Christian Kvisvik , Anne Christine Knag

分类：机器人

2022-08-01

数十年来，源自人类活动的海洋碎片一直在海洋，湖泊和河流等水下环境中积累。由于无法理解散布的确切机制，因此难以评估废物的程度，类型和数量，从而对海洋环境和人类健康产生了未知的后果。因此，用于检测和映射海洋碎片的方法对于洞悉污染动力学至关重要，而污染动态又可以用来有效地计划和执行物理去除。使用配备了水下高光谱成像仪（UHI）和立体声相机的自动驾驶水下车辆（AUV），在挪威卑尔根贝尔根的庇护海湾商店Lungegaardsvann中自主检测，映射和量化了海洋碎片。

translated by 谷歌翻译

A Transfer Learning-Based Approach to Marine Vessel Re-Identification

Guangmiao Zeng , Wanneng Yu , Rongjie Wang , Anhui Lin

分类：计算机视觉

2022-07-29

船舶重新识别技术是智能运输系统的重要组成部分，也是海洋监视所需的视觉感知任务的重要组成部分。但是，与陆地上的情况不同，海上环境是复杂且可变的，样品较少，并且在海上进行船舶重新识别更加困难。因此，本文提出了一种转移动态对准算法，并模拟海上船只的摇摆状况，使用良好的和类似的军舰作为测试目标，以改善识别困难，从而应对复杂的海洋条件和复杂的海洋条件和影响的影响。讨论不同类型的血管作为转移对象的影响。实验结果表明，改进的算法将平均平均准确性（MAP）提高了10.2％，第一个命中率（RANK1）平均提高了4.9％。

translated by 谷歌翻译